The following report details the methods used to determine appropriate filter thresholds for SNV variant calls.
At the site level, three major filters were applied to obtain a high-quality variant set:
QD - variant quality normalized by read depthSOR - strand odds ratio (SOR)FS FisherstrandTo find the optimal filter thresholds, we created simulated data in which positions of real variants are known, and examined each metric independently and in combination.
Note: filter thresholds were only optimized for SNVs.
N2.bam) using bamsurgeon (20d431e). The genome browser snapshot below shows an inserted variant (in red).wi-gatk-nf pipeline.To reduce complexity, the filter thresholds were optimized independently.
The optimal QD threshold was determined as follows:
QD value passed the threshold. For example, the table below illustrates a few variants with a filter threshold QD > 10. Variants with QD that failed the filter are classified as undetected.| CHROM | POS | QD | sim1_genotype | sim2_genotype | sim3_genotype | ==> | pass_QD_filter | is_detected |
|---|---|---|---|---|---|---|---|---|
| I | 1352 | 110 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
| I | 2566 | 90 | 1/1 | 0/0 | 1/1 | QD threshold is 10 | yes | yes |
| I | 3847 | 2 | 0/0 | 1/1 | 0/0 | QD threshold is 10 | no | no |
| I | 4975 | 38 | 1/1 | 0/0 | 0/0 | QD threshold is 10 | no | no |
| I | 5590 | 298 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
| CHROM | POS | is_detected | is_in_truth | category |
|---|---|---|---|---|
| I | 1352 | yes | yes | true positive |
| I | 2566 | yes | no | false positve |
| I | 3847 | no | no | true negative |
| I | 4975 | no | yes | false negative |
| I | 5590 | yes | yes | true positive |
| in_truth | not_in_truth | |
|---|---|---|
| detected | count of true positive | count of false positive |
| not_detected | count of false negative | count of true negative |
QD filter.SOR threshold.FS threshold.